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^ I Abstract 

I We present a practical implementation of the ensemble Kalman (EnKF) fil- 

' ter based on an iterative Sherman-Morrison formula. The new direct method 

exploits the special structure of the ensemble-estimated error covariance ma- 
trices in order to efficiently solve the linear systems involved in the analysis 
^ I step of the EnKF. The computational complexity of the proposed implemen- 

^ ■ tation is equivalent to that of the best EnKF implementations available in the 

- - - literature when the number of observations is much larger than the number of 

ensemble members. Even when this conditions is not fulfilled, the proposed 
method is expected to perform well since it does not employ matrix decomposi- 
tions. Computational experiments using the Lorenz 96 and the oceanic quasi- 
geostrophic models are performed in order to compare the proposed algorithm 
with EnKF implementations that use matrix decompositions. In terms of ac- 
curacy, the results of all implementations are similar. The proposed method 
is considerably faster than other EnKF variants, even when the number of 
observations is large relative to the number of ensemble members. 
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1 Introduction 

The ensemble Kalman ffiter (EnKF) is a well-established, sequential Monte Carlo 
method to estimate the state and parameters of non-linear, large dynamical mod- 
els [9] such as those found in atmospheric [21], oil reservoir [11], and oceanic [14] 
simulations. The popularity of EnKF owes to its simple conceptual formulation and 
the relative ease implementation [10]. EnKF represents the error statistics by an 
ensemble of model states, and the evolution of error statistics is obtained implicitly 
via the time evolution of the ensemble during the forecast step. In the analysis step, 
information from the model and the measurements is combined in order to obtain 
an improved estimate of the true vector state. This process is repeated over the ob- 
served time period. In typical data assimilation applications, the dimension of state 
space (number of variables) ranges between (9(10^) and (9(10^), and the dimension 
of the observation space between O(IO^) and C(10''). Consequently, the dimension 



2 



of the linear systems solved during the analysis step is very large, and the compu- 
tational cost considerable. In order to address this challenge we propose an efficient 
implementation of the EnKF analysis step based on an iterative application of the 
Sherman-Morrison formula. 

The paper is structured as follows. Section 2 discusses the conceptual formula- 
tion of the EnKF and several efficient implementations available in the literature. 
Section 3 presents the novel implementation of the EnKF based on iterative Sherman- 
Morrison formula, in which the special structure of the measurements error covariance 
matrix is exploited. Section 4 reports numerical results of the proposed algorithm 
applied to the Lorenz 96 and quasi-geostrophic models. Conclusions are presented 
in Section 5. 



2 Formulation of the EnKF 

EnKF consists of two steps: the forecast and the analysis. An EnKF cycle starts 
with the matrix G ]R'^st^t<=^'^<=ns whose columns xP € ]R,"=t'it<= ^ ^ form an ensemble of 
model states, all corresponding to the same model time tcurrcnt: 

X^^ = (xP,xf,...,x^J GR^--x^--. 

Typically xf is an ensemble of model forecasts. Here Ngtate is the size of the model 
state vector, and Nens is the number of ensemble members. Each ensemble member 
xf differs from the true state of the system x*''"° G JR^^statoXi^ g^^^ denote by 
G ]R,'^st=it° ^ 1 the corresponding error. The statistics of the ensemble of states is 
consistent with the background probability distribution. 

The ensemble mean x^ G ]R"=t=it°^i and the ensemble covariance matrix G 
j^NstatoXNstato ^^j^ wrlttcu as follows! 



x^ 



-B 



-1 Ncns 1 



Ncns ^ N, 
1=1 



X" = x^®1J_^iGR'^=--^'^-, (lb) 
pB ^ ^ ^ ^ . j^x^ - X"") ■ (X^ - X"")^ + Q G E'^=--x'^--<= . (ic) 

Here Inohsxi £ jR^cnsXi ^ vector whose entries are all equal one. Q is the covariance 
matrix of model errors. In the typical case where X^ is an ensemble of model 
forecasts, the explicit addition of the matrix Q to the covariance formula is not 
necessary. Instead, the effect of model errors can be accounted for by adding random 
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vectors ~ A/" (0, Q) to model states: xf ^ xf + Prior to any measurement, the 
forecast step provides the best estimation to the true vector state x*''"'^ [27]. 

The vector of observations y G R'^°bsXi jg available at tcurrcnt, where Nobs is the 
number of data points. The observations are related to the model state by the 
relation 

y = Hx + f 

where H G IR'^otsX Nstatc jg ^i^Q observation operator which maps the model space 
state into the observed space, and f ~ A/" (0, R) is a vector of observation errors, 
accounting for both instrument and representativeness errors. 

In order to account for observation errors one forms the matrix Y G JR^obaXNcns 
whose columns yi G ]R^°bs^i are perturbed measurements [15]: 

Y = (y + ei,y + e2,...,y + eN,„J 
= (yi,y2,...,yN„jGR'^°-""-% 

The vectors ej G R'^ots^^ represent errors in the data, and are drawn from a normal 
distribution ~ Af {0, R). We denote 

Y=(ei,e2,...,e,_)GR'^°-^"-% 

and the ensemble representation of the measurements error covariance matrix is 

R = — - — (r-r^)e R«°bsXN„b. _ 
Nobs - 1 ^ ^ 

The EnKF analysis step produces an ensemble of improved estimates (analyses) 
X"^ G IR'^statcXNcns applying the Kalman filter to each of the background ensemble 
members: 

X^ = X^ + K ■ (Y - H ■ X^) G RN^tateXNens^ ^2) 

K = P^-H^- (H-P^-H^ + R)"^ GR''=*^'^^''°'^% (3) 

where the matrix K G Restate xNobs jg ^]^q Kalman gain and quantifies the contribution 
of the background-observations difference to the analysis. 

The EnKF forecast step uses the dynamical model operator Ai to evolve each 
member of the ensemble X^ from the current time ^current to the next time tnext 
where observations are available: 

X^(tnext) = -Mt.....t^*ne.t (X^(teurrcnt)) G R'^--^'^-- . (4) 

The forecast ensemble X^ is the background for the new EnKF cycle at tncxt- The 
analysis and forecast steps are repeated. 
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2.1 Efficient implementations of the analysis step 

From equations (2)-(3) the analysis step can be written as 

= + ■ ■ Z , (5) 
where Z G IR'^obs^^ds is the solution of the following linear system: 

(H ■ ■ H'^ + R) -Z = (Y - H ■ X^) e R'^°b=><'^- . (6) 



A direct solution of this linear system can be obtained using the Cholesky decomposi- 
tion for matrix inversion [17, 25, 30]. While this is a numerically stable and accurate 
approach [12, 20, 26], its application to (6) leads to the following complexity [19] of 
the analysis step: 

O + Nobs ■ Ncns + Nobs ■ nL + Nstate " N^) • (7) 

This is an acceptable complexity for a large number of degrees of freedom (Nstate)? 
but not for a large number of observations (Nobs)- An alternative is to solve (6), and 
the overall analysis step, using Singular Value Decomposition (SVD) based methods. 
Those methods exploit the special structure of the data error covariance matrix R, 
which is often (block) diagonal and can be easily factorized: 



R = diag (ri, r2, . . . ,r 



' ' Nobs 



Furthermore, the observation operator H G ]R,NobsX Nstate jg sparse or can be applied 
efficiently to a state vector. Then, we can express the system matrix (6) as follows: 



S = ( X^ - X^) G R'^=t-t°x'^°- 



w = Vr 



T 

Vb^ • h ■ s ■ (h ■ s) ■ Vr^ + 1 



R. 



.Nens - 1 

Employ the singular value decomposition 

VbF^ ■ H ■ S = U ■ S ■ G R'^ot.sXNens^ (^g) 

where U G ]R'^obsX'^°bs and V G R^cnsXNens g^j,g orthogonal square matrices, and E = 
diag(cri, a2, ■ ■ ■ , CN^ns) ^ ]R,^obsX^<=ns is the diagonal matrix holding the singular values 
of W G R^obsXNobs^ fjj^g linear system (6) can be written as follows [19]: 



U ■ ( — + iV ■ Z = (Y - H • X^) G R''' 

VNens-l / ^ ^ 



obs Ncns 



(10) 



which yields the solution 



Z = VbF^ ■ U ■ diag 




■ U 



■ VbF^- (y-h-x^) . (11) 



The overall complexity of the analysis step 



■ Nobs + + Nstate " ^L) 



(12) 



is suitable for large Nstato and Nobs, assuming Npns remains small. 

Many algorithms in the literature employ SVD for either the solution of the linear 
system (6), or on the time evolution of the ensemble covariance matrix (which is 
performed by advancing in time the member deviations). Algorithms in this family 
include direct methods based on Sherman-Morrison identity, serial methods, the 
ensemble adjustment Kalman filter (EAKF), and the ensemble transform Kalman 
filter (ETKF); a complete description is given in [28]. All these methods have the 
overall complexity (number of long operations) given in (12). 

A different approach is to employ iterative methods for solving the linear system 
(6), for instance the conjugate gradient method [24, 13, 7, 8] for Nens right-hand sides. 
However, each iteration costs O (n^^^, ■ Nobs), therefore iterative methods do not seem 
to be competitive for the solution of (6). 

The well-established EnKF implementations presented above employ a Cholesky 
or SVD decomposition, which require considerable computational effort. The next 
section discusses an efficient implementation of the ensemble Kalman filter which 
does not require any decomposition prior to the solution of the linear system (6). 



3 Iterative Implementation of the EnKF Analysis 
Step 



We make the assumptions [28, 19] that, in practice, the data error covariance matrix 
R has a simple structure (e.g., block diagonal) and the observation operator H is 
sparse or can be applied efficiently. Moreover, in real applications the number of 
observations Nobs is much larger than the number of ensemble members Nens, but 
smaller than the size Ngtate of the vector state. 

We define the matrix of member deviations S G iR'^statoXNcns follows: 



S 




1 



Nstato XN, 



(13) 
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which allows to write the ensemble covariance matrix as 

pB _ g _ gT ^ J^NstatoXNatate (14) 

By replacing equation (14) in (6), the linear system solved during the analysis step 
is written as follows: 

D = Y-H-X^ e R''°'^=^''^"% (15a) 
V = H-SgR''°^=^''"'% (15b) 
(R + V ■ V^) ■ Z = D e E"°b=><«- . (15c) 

Note that 

Nens 

W = R + V = R + ^Vi 

i=l 



can be computed recursively via the sequence of matrices W^"^-* G ]RN°bs>^"obs: 
W(°) = R, 

W^'^) = R+X:vi-v;^ = W(^-^) + Vk-v^ l<k<Ne,., 



i=l 

therefore 



W = W('^-) = R + 52vi-v^T = W('^--i)+v,_-vJ_. (16) 

i=l 

By replacing equation (16) in (15c) we obtain: 

(^(Nens-l) ^ ^^^^^^ . J . ^ ^ . (17) 

The linear system (17) is solved using the Sherman-Morrison formula [1]: 

(A + L ■ M ■ N)"^ = - A"^ ■ L ■ (M"^ + N ■ A ■ L)"^ ■ N ■ A"^ (18) 

with A = (w("-=-i))^\ L = 1, M = VNcns and N = v^^^^. The solution of (17) is 
computed as follows: 

O \ Ncns ^ / Ncns ' ^ 



where F^'^<="=) G ]R'^°bsXNons g^,^^ g(Nons) g jj^NobsXNcns q^j-q giyeii hj the solution of the 
following linear systems: 

-y^(Ncns-l) . p(Ncns) — J) ^ J^Nobs X Ncns ^ (20a) 

. g(Ne„s) ^ ^^^^^ ^ J^N.bs X 1 _ (^20b) 

Note that (20a) can be written as follows: 

^(N.„e-1) . f.(Ncns) _ J. ^ J^N.bs X 1 ^ 1 < i < N^^s , (2l) 

where ^^'^""^^ g R'^ob^^i and di G R^^ob^^^ are the i-th columns of the matrices F('^<="'=) 
and D, respectively. Following (19), the i-th column of the matrix Z is given by: 

Zi = - g^-^^-) ■ (1 + ■ g(-^"=))~' . ■ G R'^°-xi . (22) 

By equation (22), the computation of Z involves the solution of the linear systems 
(20b) and (21). We apply the Sherman-Morrison formula (18) again. The solution 
of the linear system (21) can be obtained as follows: 

f(Ncns) ^ f(Ncns-l) _ (Ncns-1) . (' 1 I _ p.(Nons - 1) ^ " ^ . 

i i O Ncns — 1 O J 

■vL-i • f/""^"'^ ^ (23) 

where ^ j^NobsXi ^^^^ g(Ncns-i) ^ j^NobsXi g^j^g ^Yie solutions of the following 

linear systems, respectively: 

^(Ncns-2) _ ^(Ncns-1) _ ^ J^Nobs X 1 

■yy-(Ne„s-2) _ g(Ncns-l) _ _ 1 G R"^"*"" ^ ""^ . 

The linear system (20b) can be solved similarly. Note that the solution of each 
linear system involves the computation of two new linear systems, derived from the 
matrix sequence (16). each of the new linear systems can be solved by applying 
recursively the Sherman-Morrison formula. For simplicity we denote by f and g the 
solutions of the new linear systems in each recursively application of the Sherman- 
Morrison formula. We have that: 

^(Nens) . Z = D , 



f 



8 



-1 



vT . ^^i^.ns^i) \ di G R""'^^^^ 1 < i < 



f 



' V ^ ^ ^ ^ii^ 

f g 

1 + v'^ , ■ w(^<=--2)"^ • V. 1 
f 

■i^ens -i^ciis -l^cns J- 



Nobs X 1 



f 



Vk 

V ^ ^ ■' 

f 



-1 



g 
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K-i ^ V ^ 
f 



f 



W«"'-x 



WW-'-x -wW -vi- I 1+v^ W(°)"'-Vi 
v^^ ■ W(0'~' ■ X, 



N„hs X 1 



R"^ ■ X G R''" 
R"^ ■ vi e R''°'^= 



xl 



where x G R'^ots^i can be either, a column of matrix D G R'^ots^f^cns or V G 
We note that: 

• The computation of W*^"^-* ^ ■ x involves the solution of the linear systems 
^(k-i) . f = X and W(i^-i) ■ g = Vk. 

• Since the recursion is based on the sequence of matrices defined in (16), the 
base case is the linear system R^^ -x in which the matrix R is (block) diagonal. 

From the previous analysis we derive a recursive Sherman-Morrison formula as 
follows. Define 



S (x, k) = < 



z = R^ x, 
f = 5(x,k-l) ; 
g = 5(vk,k-l) , 
z = f-g-(l + v,^.g)-^v,^-f: 



for k = , 
for 1 < k < Np 



(24) 
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where x G R'^°'^'=^^. the columns of Z G ]R'^obs^'^<=ns are computed as follows: 

Zi = 5 (di, Nens) e R^°'-^^ , 1 < i < Nens. 

The recursive the computations performed by S (•) can be represented as a tree 
in which the solution z G R'^obsXi of each node depends on the computations of its 
left (f G IR'^obs^^) and right (g G ]R"°bs^^) children (i.e., on the solutions of two 
linear systems). Figure 1 illustrates the derivation of linear systems in order to solve 
■ z = d for Ncns = 3 and d G {di, da, dg}, d G IR^"'^^^^ 




Figure 1: The recursive Sherman- Morrison formula {S (•)) applied to solve the linear 
system W*^^^ - Z = D G ]R"°bs^3_ jjgj.g jg g^j^y column of matrix D G IR'^obs^^^ Dashed 
nodes represent repeated computations. 

We see that S (•) solves multiple times identical linear systems. For instance, 
the repeated computations performed in order to solve W*-^-* ■ Z = D G R^obsxs 
are represented in Figure 1 as dashed nodes. There, for instance, the linear system 
R • g = Vi is solved four times in the last level. The total number of linear systems 
to solve is (9(Nens ' 2'*™=), i.e., it increases exponentially with regard to the number 
of ensemble members if identical computations are not avoided. Next subsection 
discusses how to achieve this and obtain an efficient implementation of the recursive 
Sherman-Morrison formula. 
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3.1 An iterative Sherman-Morrison formula for matrix in- 
version 

In order to avoid identical computations in Figure 1 we can solve the linear systems 
from the last level of the tree up to the root level. We denote by U G IR^obaXNens g^^^ 
Z G iR'^obs^'^cns the matrices holding partial results of the computations with regard 
to V and D, respectively. 

Level can be computed as follows without any repeated effort: 



U(o) 



(R-i-vi,R-i-V2,R-^-V3) 
(w(°)"'-vi,W(°)"'-V2,W(°)"'-V3 



(0) „(o) 



u , u 



1 ; "2 5 ^3 



Z(o) 



(R 1 di,R-i d2,R-i da) 

(w(0)"'-di,WW"'-d2,WW"'-d3 



,(0) JO) JO) 
'1 1^2 5 ^^3 



We make use of the Sherman- Morrison formula (18) and compute level 1 as follows: 



h(i) 



u 



(0) 



u 



(0) 



1 + vT. W(o)"'- vi 

1 



u 



(0) 



Z(i) 



W(°)~'-vi,W(i)"'-V2,wW"'-V3) 



(1) ,,(1) 



Ui , u 



1 ; "2 5 "^^3 



zf)-hw. (vrzf)),zf -h«■ 
w«"^dl,w«"'■d2,w«"^d, 



,(0) 



7(1) 



Note that u^'^'' has not been updated since it is not needed in the computations of the 
next levels. Similarly, the computations at level 2 make use of the Sherman- Morrison 
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formula (18): 
h(2) = u(^) ^ , 

' I + Vj. W(l)"'- V2 
(1) 1 

U(^) = (uS^u«,u«-h«.(v-.ui^))) 

= (W(°)"' ■ vi, WW~' ■ V2, W(2)~' . V3) 

= (W(2)~' . di, W(2)-' . d2, W(2)"' • ds) 

_ (J2) (2) (2)\ 

The vectors u^^^ and u'2'' are not required for the computations of level 3, and they 
are not updated. Making use of the Sherman- Morrison formula (18) once again, the 
root level is computed as follows: 

h(3) = u« , 

Z(3) = (zf ) - h(^) . (v- ■ zf )) , z?) - h(^) . (v- • z?)) , zf - h(^) . (v- . zf ) ) 
= (W(3)"' . di, W(3)~' . d2, W(3)~' ■ da) 

The computations performed by this iteration are shown in Figure 2. 
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Figure 2: Necessary computations for the solution of W*^'^^ ■ Z = D G ]R^°bsx3 ^gj^g 
the recursive Sherman-Morrison formula. This iterative version avoids all redundant 
computations. 

Some key features of the iteration are highlighted next. 

• The number of iterations is Nens- 

• At level matrices Z^°) and U'-''-' aree computed as follows: 

Z(o) ^ R-i . D G R''°'^=''''^"% 

• The matrix W^'') G R'^obsXNot, 

is never stored in memory. It can be represented 
implicitly by matrix V G jR'^obsXNons xhis implicit representation realizes con- 
siderable memory savings, especially when Nobs ^ Nens- 

• At iteration k, only the columns up'' with k < i < Nens are updated. 

In summary, the solution of the linear system (17) is obtained by the following 
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iteration: 



Level 



Level 1 




R 1 D 
R 1 V 



-1 



u 



(0) 



Z(o) - h(i) ■ (vT ■ ZW) 



u 



(0) 



Level k : 




T (k-1) 



U 



(k-1) 



Z(k-l) _h(k) . (vT.Z(k-l)) 



U: 



(k-1) 



1,- 



Level Np 



(Nons-l) 



h(Ncns) = h + 

\ Ncns / 



(Ncns-l) 



Since the matrix R has a simple structure its inverse is easy to obtain. We will 
consider here only the case where R is diagonal, but the extension to the block 
diagonal case is immediate^. Furthermore, those diagonal entries can be stored in 
an array in order to save memory: 



R = r 



Nobs 



where the components G r correspond to the diagonal entries Rj^j. 

Putting it all together, we define the iterative Sherman- Morrison formula (r, V, D) 
as follows: 

• Step 1. Compute the matrices Z(°) e R'^obsXNens ^nd U(°) G R^ob^xNen^ 
follows: 



Z(o) = ] 


r jo)i 
t% J 















A<i< Nobs, 1 < J < N 

e 



(25a) 
(25b) 



^In the case of nondiagonal matrices, under the assumptions done (R is block diagonal or ease 
to decompose), the computations Z'o) = R-^ • D e RN°b=x-^c„. and U(") = R-^ • V e RNobsXN™. 
can be efficiently performed. 
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where dij,Vij and rj are the components of matrices D G IR'^obsXNons^ y G 
jp^NobsXNens vcctor T G R""''^ rcspectively. 

Step 2. For k = 1 to Ncns compute: 

h^'^) = (l + v^uf''))"'u('^^'^GR'^°-"\ (26a) 

Z(k) _ Z^'^-^) - h^"^) ■ (Vif ■ Z^^-l)) G E'^obsXNens^ (26b) 

_ h(k) . (^vjf • Up"'^) G E'^°'==^\ k + 1 < i < Nens. (26c) 



(k) 



We now use the iterative Sherman-Morrison formula in the analysis step to obtain 
an efficient implementation of the Ensemble Kalman filter (SMEnKF). This filter is 
as follows. The background ensemble states are obtained from the forecast step 
(4), the ensemble mean is given by (la), and the ensemble deviations form the 
mean S are given by (13). The analysis is obtained as follows: 

B = Y -n (X^) G R^°b=x^<=-, 

V = ?^(S) G R"°''=^''"^% 

Z = 54r, V,D) G R«°b=x«-% 

X^ = X^ + S ■ ■ Z G R'*=*^*"''''^"% 

where the function "H (G) G RNobsXNcns jg efficient implementation of the observa- 
tion operator applied to several state vectors, represented by G G R"i>tateXNens_ 

3.1.1 Inflation aspects 

Infiation increases periodically the ensemble spread, such as to compensate for the 
small ensemble size, to simulate the existence of model errors, and to avoid filter 
divergence [16]. All the infiation techniques applied in traditional EnKF can be 
used, virtually without modification, in the context of SMEnKF. For example, after 
the forecast step, one can increase the spread of the ensemble 

Xi ^ xf + a (Xi - xf ) , 1 < i < Nens , 

such as the ensemble covariance is increased by a factor a"^ [29]. 
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3.1.2 Localization aspects 

Using (3), the analysis step can be written as follows: 

= X^ + AX^ 
AX^ = S-V^-Z 

= ■ ■ (H ■ ■ + R)"' D G R'^=--x'^- 

^ Pf •H'^- (H-Pf ■H^ + R)"^D G R''^*-*^^"-^. (27) 

Localization techniques are explained in detail in [4]. Localization replaces the en- 
semble based P^ by a matrix P^ = p o P^ in (27), where p is a localization matrix 
and o represents the Schur product. 

Clearly localization in the form (27) requires the full covariance matrix, and 
cannot be applied in the context of the iterative Sherman-Morrison implementation. 
Applying SMEnKF with a single data point yi leads to a correction AX^;|, which 

can be localized by multiplication with a diagonal matrix A{i} that scales down 
components with the negative exponential of their distance to the observation i 
location, and sets them to zero if outside the radius of influence: 

This can be applied in succession for all data points to obtain a fully localized 
solution. 

We discuss next a general approach to perform partial localization. Let Xj be an 
individual component of the state vector and yj an individual observation. Define the 
impact factor 6ij G [0, 1] of the information in yj on the state point Xi. For example, 
one can use a correlation length, and a radius about the measurement location outside 
which the impact factor is zero. Define the influence matrix A = (Sij) G R^^statcXNobs^ 
and replace (27) with the following partial localization formula 

AX^ ^ Pf ■ • (H • P'^ • + R) d, G R'^-^-x^-- 
= Ao(S-VT)-Z. 

The (i, i)-th entry contains the i-th component the correction vector for the i-th 
ensemble member and reads 

Nens Nobs 

AXf, = J2 Si.k '^i.j Zj,, , 1 < i < Nstatc , 1 < ^ < ^ens ■ (28) 

k=l j=l 

The components of the correction matrix (28) are independent of one another, and 
can be evaluated in parallel after the system solution Z has been computed. 
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3.2 Computational complexity 

In the complexity analysis of the iterative Sherman-Morrison formula we count only 
the long operations (multiplications and divisions). Moreover, as discussed before, 
we make the assumptions presented in [28, 19], namely, the data error covariance 
matrix R ^^o^^^^'^obs jg inexpensive to decompose, and the observation operator H 
can be applied efficiently to any vector. We now analyze each each step of the 
iterative Sherman-Morrison formula. 

In the first step (25) each row i of matrices D G IRf^obs^^Nons ^^^^^ y ^ j^NobsXNcns 
is divided by the corresponding component ri E R'^°''= in order to obtain Z^^^ G 
j^NobsXNons gj^(^ u^^^ G R.'^obs ^ respectlvety. This yields to Nobs ■ Nens number of long 
operation for each matrix, therefore: 

Tstepl (Nens, Nobs) = 2 ■ Nobs " Ncns- (29) 

In the second step (26) we compute the vector h^'^) G R""""" (26a), and the matrices 
Z(k) e ]RNobBXNens (^26b) aud TJC^) G RN^b^xNens (^26c) . The number of long operations 
for each of one are as follows: 

Nobs 



Nobs 

Ncns-Nobs 



Z(k) ^ z(k-i) _ j^(k) . I . 2(k~i) 



Ncns-Nobs 
Nobs 



Since the second step (26) is performed Ncns times, the number of long operations 
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can be expressed as: 



Tstep2 (Nens; Nobs) 



k=l 



\ 



k-1 



2-Nobs + 2-Nobs-Nens + (2 ■ Nobs) 

hW Z(k) 



v 



2 ■ Nens ■ Nobs + 2 ■ N^^3 ■ Nobs + - 1) ■ 2 ■ Nobs 



k=l 



3 ■ N,„, ■ Nobs + Nens ■ Nobs 



(30) 



Consequently, from (29)-(30), we have 



TsMF (Nens, Nobs) 



2 ■ Nobs ■ Nens + 3 ■ N;„, ■ Nobs + Ne 



Nobs 



Tstcpl (NonsjNobs) 

3 ■ (Nens ■ Nobs + Ne 



J- stcp2 (Nens, Nobs) 
■ Nobs) , 



which yields a complexity of 



O {nI^, ■ Nobs) 



(31) 



Note that when R is not diagonal, under the assumptions above, the computations 
(25) of Z(°) and U(°) can be efficiently performed in (9(Nobs"Nens) long operations; the 
overall effort becomes 3 ■ (N^^g ■ Nobs + Ng^g ■ Nobs)- This leads to the same complexity 
(31) for R diagonal, block diagonal, or in general easy to decompose. 
The overall complexity of the analysis step for the iterative formula 



C(Ncns-Nobs) 



+ s ■ . ^ 

^(Nins-Nobs) 



O(NLs-Nstato) 



is: 



O (N^^g ■ Nobs + N^^g ■ Nstate) 



(32) 



The complexity of the proposed implementation of the EnKF is equivalent to the 
upper bounds os the methods described in [28], as detailed in the Table 1. The term 
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Ngj^g does not appear in the upper-bound of the proposed method even when R is not 
diagonaL This term can affect the performance of the EnKF when the number of 
observations is not large enough with respect to the number of ensemble members. 



Analysis method 


Computational cost 


Direct [28] 

Serial [5] (for each observation) 
ETKF [3] 
EAKF [3] 

Proposed EnKF Implementation 


O (n2„, ■ Nobs + n3^, + N^^, • Nstatc) 
O (Ncns ■ Nobs + Ncns " Nobs " Ngtatc) 
O (n2„3 ■ Nobs + N^^s + nL ■ Nstatc) 
O (nL ■ Nobs + nL + nL ■ Nstatc) 
O (n2„^ ■ Nobs + N^^s ■ Nstatc) 



Table 1: Summary of computational costs of the analysis steps for several ensemble 
filters. The costs are functions of the ensemble size Nens, number of observations Nobs 
and state dimension Nstatc- 



3.3 Parallel implementation 

In this section we discuss an efficient parallel implementation of the iterative Sherman- 
Morrison formula. Since the algorithm (25)-(26) can be applied individually to each 
column of the matrices Z^") e IR^^obs and U^^^ e j^NobsXNcns^ there are 2 Nens com- 
putations that can be performed in parallel. We define the matrix G^^-* G iR^obs^^ Ncns 
holding the columns of V G R^^obsXNen^ g^^^ ^ j^NobsXNens as follows: 



G(°) = [V,D] = [Vi,V2,...,V,_,di,d2,...,d, 



(33) 



,{0) (0) (0) (0) 

3l ; • • • ; SNons' ONens + l' ' ' ' ' 02-Nens 



G R''"'^' 



x2-No 



Let Npj.Q^ be the number of available processors at the initial time. The number of 



operations per processor is 



_ ^ " Nens 

P ~ NO 

proc 



The matrix (33) can be written as 

G(o) 



p,{0) p.(0) T3(0) 

'- ' proc 



where the blocks B|-°^ G ]R"ob=^c;° 



B 



(0) 



,(0) (0) (0) 

'(i-l)CO + l' S(i-l)C0+2' • • • &iCO 



G R"°bsxc« for 1 < i < N°^„,, 
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The parallel, first step (25) of the iterative Sherman-Morrison formula is implemented 
as an update over the blocks: 



B 



(1) 



R-^ • ^ G R^°bsxco ^ all 1 < i < N' 





proc ' 



which yields 



-D-l pj(0) -D-1 pj(0) p5(0) 

^ ^ proc 

p>(1) p>(1) 
JDi , , . . . , -Djto 

' proc 



.(1) 



&1 ) • • • ) &Ncns5 ONcns + 1' ' ' ' ' &2-Ncns 
^ ^ ■' 



U(0) 



Z(o) 



(0) „(0) 



1 ) "2 ) • • • ) "Ncns 5 



(34) 



The second step (26) of the iterative Sherman-Morrison formula consists of a 
sequence of updates applied to the matrices Z^^^ and \J^^\ Such matrices are repre- 
sented by the columns of matrix G^^^. Thus, consider the computation of level one, 
each column of the matrix G*-^-* can be updated as follows: 



1 + v 



1 &1 



Similarly to the first step, the computations can be grouped in blocks 



B 



(1) 



'(i-l)Ci+2' &(i-l)Ci+3' • • • 



NobsXCi 



for 1< i < Ni 



proc 



and distributed over the processors: 



B 



(2) 



B 



(1) „(i) 



1 + v 



1 &1 



■ bP^ ) e R'^" 



)N„KsXCi 



for all 1< i < Ni,., 



proc 



Note that g^^"* (u^^-*) is not updated since it is not required in subsequent computa- 
tions. Thus, for the matrix G*^^-* 



G(2) 



(1) (2) (2) (2) 
1 ' &2 '63 5 • • • 5 &2-No 



the next common computation is gf^ (^2^^), and for the same reasons, this vector is 
not updated. 
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In general, at time step t, 1 < t < Nens, the first t columns of the matrix G^*-* are 
not included in the update process: 



(1) (2) (t-1) (t) (t) (t) 

5l 7 • • • ) §t-l 7 St 7 gt+l7 • • • 7 g2-Nc 



The parallel computation of (26) at time step t is performed as follows: 

• Compute the number of computation units (columns of matrix G*-*) G ]R,'^°bsXNcns^ 
per processor: 

f~<t ^ ' -^cns t 
proc 

• Perform the update in parallel over the blocks: 

B(t) = Bf ) - gf ) . (1 + v7 . gf . {^rj . Bf G R^-x^^ , for all 1 < i < N^.^, , 



where 



B(t) 



(1) (1) 
'(i-l)Ci+l+t' S(i_i)ci+2+t' • • • SiCi+t 



This parallel implementation of the iterative Sherman Morrison formula leads to 
the complexity: 

Ncns 

T(Nobs7Nens) = 0(C°-Nobs)+$^0(C^ ■Nobs • 

stepl stcp2 

Notice, when the number of processors at time < t < Nens is Npj.^^, = 2 Ncns — t 
then Cp = 1. Hence, the corresponding overall complexity of the SMEnKF is: 

T(Nobs7 Nens) = 0{Nohs " Ngns + Nstate " Nens) , 

therefore, when the number of observations is large enough relative to the number of 
ensemble members, this parallel approach of the iterative Sherman- Morrison formula 
exhibits a linear behavior, making this implementation attractive. 
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4 Experimental Results 



4.1 Experimental setting 

The Sherman-Morrison EnKF implementation as well as the EnKF implementations 
based on Cholesky and SVD are coded in Fortran 90. The Cholesky and SVD 
decompositions use functions from the LAPACK library [2] as follows: 

• The matrix W e Ei^ob=><i^obs jg b^jit using DSYRK as follows: 

Ncns - 1 

• The functions DPOTRF and DPOTRI are used to compute the Cholesky de- 
composition of matrix W G ]R,NobsXNobs_ 

• The SVD decomposition is performed tby the DGESVD function. 

In order to measure the quality of the solutions we employ the following perfor- 
mance metrics. The Elapsed Time (ET) measures the overall simulation time for a 
method *. This metric is defined as follows: 

ET(*) = Forecast* + Analysis* (35) 

Where Forecast* and Analysis* are the running time for the overall forecast and 
analysis steps respectively. 

The Root Mean Square Error (RMSE) is defined as follows: 

_ /Nstcps 

e {*) = RMSE = J2 I^SEt 



■^steps 



t=l 



where Ngtcps is the number of time steps and RSEt is the Root Square Error at time 
t defined as follows: 



RSEt = \-^- - xf)^ ■ (x^- - xP) 

V state 

where x*'""*' is the true vector state at time t, and x^ can be either the ensemble mean 
in the forecast x^ or analysis x"^ at time t. As can be seen the RMSE measures in 
average the distance between a reference solution (x*''"'^) and the given solution (x^). 

The EnKF implementations are tested on two systems: the Lorenz 96 model 
[18] representing the atmosphere, and a quasi-geostrophic model [6] representing the 
ocean. They define the model operators (A^) in the EnKF experiments. To compare 
the performance of different EnKF implementations we measure the elapsed times 
and the accuracy of analyses for different values of Nobs and Nens- 
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4.2 Lorenz-96 model 



The Lorenz 96 model is described by the following system of ordinary differential 
equations [18]: 



(X2 - Xn^,^,^_i) ■ Xn,,,,, - xi + F for i = 1 

(Xi+i - Xi_2) ■ Xi_i - Xi + F for 2 < i < Nstatc " 1 , (36) 

(Xi - Xn,,,,,_2) ■ XN,tatc-l - ^Nstatc + F for 1 = Ngtate 

which has been heuristically formulated in order to take into in account properties 
of global atmospheric models such as the advection, dissipation and forcing. This 
model exhibits extended chaos with an external forcing value (F = 8), when the 
solution is in the form of moving waves. For this reason, the model is adequate to 
perform basic studies of predictability. 

The test assesses how the efficiency of the EnKF implementations depend on the 
input parameters Nobs and Ncns when Nobs ~ ^ens (the number of observations and 
ensemble members are relatively close). The experimental setting is described below. 




• One time unit of the Lorenz 96 model corresponds to five days of the atmo- 
sphere. The observations are made over 500 days (100 time units). 

• The background error is assumed to be 5%, i.e., the initial ensemble mean's 
deviation from the reference solution is drawn from a normal distribution whose 
standard deviation is 5% of the reference value. 

• The external forcing is set to F = 8.0. 

• The dimensions of the model state are Nstate e {500, 1000,3000,5000}. While 
the typical dimension for the Lorenz-96 model is Ngtatc = 40, we scale the 
system up to assess the performance of different implementations. 

• The number of observations equals the number of states. Nobs = Nstatc- Due to 
this, the analysis step involves large linear systems of size W G iRi^statoX Nstatc _ 

• The number of ensemble members Ncns depends on the size of the state vector 
as shown in Table 2. 

• At each time t, the synthetic observations are constructed as follows: 

y, = xf "'^ + et G R"=*^*« (37) 
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since the number of observations and variables of the vector state are the same, 
et belongs to a normal distribution with zero mean and covariance matrix 

R = diag {O.Ol^} G R'^^'-t- , 

as is usual in practice. The errors are replicated for each compared, EnKF 
implementation. Due to this, the same data errors are hold for all tests. 

• The assimilation window is five model days. 

• The localization (28) is applied using the influence factors 

V Nstate / 

where min{i,j} is the minimum distance between the indexes i and j of the 
vector state; this distance accounts for the periodic boundary conditions in 
(36). 

The RMSE results are shown in Table 3. All methods provide virtually identi- 
cal analyses. As expected, the analysis improves when the size of the ensemble is 
increased. 



Nstate 


N 

^ ens 


500 


{200,250,300,350,400} 


1000 


{400,450,500,550,600} 


3000 


{900,950,1000,1050,1100} 


5000 


{1500,1550,1600} 



Table 2: Number of ensemble members N^ns with respect to the dimension of the 
vector state Ngtate- 
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Natate/Nobs 


Step 




EnKFsher 


EnKFchoi 


EnKFsvD 


500/500 


Forecast 


200 
250 
300 
350 
400 


0.006491846885685 
0.003089844737585 
0.001923620680204 
0.001501324727984 
0.001238879857327 


0.006491846885685 
0.003089844737585 
0.001923620680204 
0.001501324727984 
0.001238879857327 


0.006491846885685 
0.003089844737585 
0.001923620680204 
0.001501324727984 
0.001238879857327 


Analysis 


200 
250 
300 
350 
400 


0.005406046859821 
0.002577923867043 
0.001595214779293 
0.001239247824936 
0.001019599347515 


0.005406046859821 
0.002577923867043 
0.001595214779293 
0.001239247824936 
0.001019599347515 


0.005406046859821 
0.002577923867043 
0.001595214779293 
0.001239247824936 
0.001019599347515 


1000/1000 


Forecast 


400 
450 
500 
550 
600 


0.004416601672762 
0.002808556499338 
0.002256025116209 
0.001827747641793 
0.001561032221877 


0.004416601672762 
0.002808556499338 
0.002256025116209 
0.001827747641793 
0.001561032221877 


0.004416601672762 
0.002808556499338 
0.002256025116209 
0.001827747641793 
0.001561032221877 


Analysis 


400 
450 
500 
550 
600 


0.003643849246395 
0.002320171220995 
0.001863322297334 
0.001503808242092 
0.001281928557481 


0.003643849246395 
0.002320171220995 
0.001863322297334 
0.001503808242092 
0.001281928557481 


0.003643849246395 
0.002320171220995 
0.001863322297334 
0.001503808242092 
0.001281928557481 


3000/3000 


Forecast 


900 
950 
1000 
1050 
1100 


0.009439626047886 
0.007199551317193 
0.005410752525373 
0.004299142614958 
0.003476460994219 


0.009439626047886 
0.007199551317193 
0.005410752525373 
0.004299142614958 
0.003476460994219 


0.009439626047886 
0.007199551317193 
0.005410752525373 
0.004299142614958 
0.003476460994219 


Analysis 


900 
950 
1000 
1050 
1100 


0.007957954481050 
0.006074278547494 
0.004552394986586 
0.003597428757064 
0.002902565379165 


0.007957954481050 
0.006074278547494 
0.004552394986586 
0.003597428757064 
0.002902565379165 


0.007957954481050 
0.006074278547494 
0.004552394986586 
0.003597428757064 
0.002902565379165 


5000/5000 


Forecast 


1500 
1550 
1600 


0.007678111830765 
0.006378636076218 
0.005608079561230 


0.007678111830765 
0.006378636076218 
0.005608079561230 


0.007678111830765 
0.006378636076218 
0.005608079561230 


Analysis 


1500 
1550 
1600 


0.006490749509210 
0.005389012789042 
0.004728980886456 


0.006490749509210 
0.005389012789042 
0.004728980886456 


0.006490749509210 
0.005389012789042 
0.004728980886456 



Table 3: RMSE for the Lorenz-96 model with different number of states. When the 
number of ensemble members is increased the estimation of the true vector state is 
improved. All EnKF implementations provide virtually identical results. 
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xlO"' pXlO"' pXlO"' 




Ph °0 50 100 °0 50 100 °0 50 100 



(a) Sherman (b) Cholesky (c) SVD 

Figure 3: Analysis RMSE for the Lorenz model with 500 of vari- 
ables. Different curves correspond to different numbers of ensemble members: 
200(+),250(-),300(n),350( ) and 400(.). When the number of ensemble mem- 
bers is increased the analysis is improved. 

Figures 3 and 4 show the RMSE decrease over the assimilation window for Nstate = 
500 and 1000, respectively. When the number of ensemble members is increased the 
analysis errors are smaller, as expected. There is no significant difference in results 
between different implementations of the EnKF. 




Ph V 50 100 V 50 100 V 50 100 



(a) Sherman (b) Cholesky (c) SVD 

Figure 4: Analysis RMSE for the Lorenz model with 1000 of vari- 
ables. Different curves correspond to different numbers of ensemble members: 
200(+),250(-),300(n),350( ) and 400(.). When the number of ensemble mem- 
bers is increased the analysis is improved. 

The ET results are shown in Table 4. The Cholesky decomposition is the most 
efficient for a small number of observations and states. When the number of obser- 
vations is increased, the relative performance of Cholesky deteriorates, as expected 
from the complexity results presented in Section 1. The Cholesky decomposition 
solution of the linear system (6) is not suitable when the number of observations is 
large. The SVD implementation exhibits a good performance for a small number 
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of ensemble members and observations. However, the ET of the SVD implemen- 
tation grows faster than that of the Cholesky implementation when the number 
of ensemble and/or observations are increased, due to the term N|!j^g ~ N^^^ in its 
complexity formula. Tthe EnKF implementation based on SVD is not suitable for 
a large number of observations or a large number of ensemble members. Finally, 
the Sherman-Morrison implementation has the best performance for a large number 
of observations and states. This implementation is suitable for a large number of 
observations. Since the term JSS^^^ does not appear in the cost upper-bound of the 
iterative-Sherman implementation, when Nens ~ Nobs, the proposed implementation 
will exhibit a better performance than those implementations presented in [3, 5, 28] 
since they are upper-bounded by (12) (see table 1). 



N u /N ^ ^ 

J-^obs/ -L^ state 


N 

■'- ^ ens 


H/lirvr Sher 


ji/iixvr Choi 


EnKFsvD 




200 


46.5 s 


36.9 s 


66.3 s 




250 


72.1 s 


45.1 s 


91.4 s 


500/500 


300 


103.5 s 


54.2 s 


118.9 s 




350 


140.8 s 


64.4 s 


138.8 s 




400 


183.6 s 


75.4 s 


174.5 s 




400 


377.6 s 


327.4 s 


687.1 s 




450 


500.1 s 


364.6 s 


715.9 s 


1000/1000 


500 


601.7 s 


403.5 s 


877.3 s 




550 


731.8 s 


462.4 s 


1038.3 s 




600 


873.2 s 


492.4 s 


1199.3 s 




900 


1.7 h 


2.3 h 


3.8 h 




950 


2.1 h 


2.4 h 


3.9 h 


3000/3000 


1000 


2.2 h 


2.5 h 


4.1 h 




1050 


2.4 h 


2.7 h 


4.4 h 




1100 


2.6 h 


2.8 h 


4.8 h 




1500 


8.1 h 


11.0 h 


17.3 h 


5000/5000 


1550 


8.8 h 


11.2 h 


17.8 h 




1600 


9.2 h 


11.5 h 


19.9 h 



Table 4: Computational times for the Lorenz model assimilation performed with dif- 
ferent EnKF implementations. The Cholesky decomposition is the most efficient for 
a small number of observations and states. The Sherman-Morrison implementation 
is the bet for a large number of observations and states. 
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4.3 Quasi-geostrophic model 



The Earth's ocean has a complex flow system influenced by the rotation of the Earth, 
the density stratification due to temperature and salinity, as well as other factors. 
The quasi-geostrophic (QG) model is a simple model which mimics the real behavior 
of the ocean. It is defined by the following partial differential equations [6]: 



• r- J{il),q)+ (5-^ = -rkb-C + rkh-V\-rkh2-V\ 



dt ' dx 



+ sin (2 ■ TT ■ , (38) 



External Force 



where 



' dx dy dy dx ' 

q = ( — F-i/) is the potential vorticity, ip is the stream function, F is the Froud number, 
C = V^ip is the relative vorticity, r is a sort of the Rossby number, rkb is the bottom 
friction, rkh is the horizontal friction and rhk2 is the biharmonic horizontal friction 
and X and y represent the horizontal and vertical components of the space. 

Moreover, q and ip are related to one another through an elliptic operator [23]: 

q = V^ijj, (40) 

which yields 

V-2g = ^ , (41) 

where 

+ 



(9x2 Qy2 

This elliptic property reflects the assumption that the flow is geostrophically 
balanced in the horizontal direction, and hydrostatically balanced in the vertical 
direction. 

The QG experiment studies the behavior of EnKF implementations when Nobs ^ 
Nens (the number of observations is much larger than the number of ensemble mem- 
bers) as is usually the case in practice. Moreover, this scenario is more difficult than 
the previous one (the Lorenz model): large model-errors are considered in the initial 
ensemble members. Besides, data is available every 10 time units. 
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We consider three different grids, denoted QGNM, where the number of hori- 
zontal and vertical grid points are N and M, respectively. Specifically, we employ 
in experiments QG33 (small instance), QG65 (medium instance) and QG129 (large 
instance). The horizontal and vertical dimensions of the grid are denoted by Lx 
and Ly respectively. These instances and the corresponding parameter values are 
summarized in Table 5. 



Instance 


Lx 


Ly 


N 


M 


rkb 


rkh 


rkli2 


/3 


r 


QG33 


0.4 


0.4 


33 


33 


10-" 


10-^ 


2 X 10-^^ 


1.0 


10-" 


QG65 


1.0 


1.0 


65 


65 


10-" 


10-'^ 


2 X 10-1^ 


1.0 


10-" 


QG129 


1.0 


1.0 


129 


129 


10-" 


10-^ 


2 X 10-^^ 


1.0 


10-" 



Table 5: Parameter values for the QG model instances considered. Lx and Ly repre- 
sent the horizontal and vertical grid sizes, and N and M are the number of horizontal 
and vertical grid points, respectively. 



The experimental settings are described below. 

• There are 1200 time steps, each of one representing 1.27 days in the ocean. 

• The vorticity of the ocean at each grid point provides a component of the vector 
state. 

• The computation of the stream function is done through the solution of the 
Helmholtz [22] function according to the elliptic property (41). 

• Homogeneous Dirichlet boundary conditions are assumed. Due to this, the 
boundaries of the grid are not mapped into the state vector, and Nstate = 
(N-2) ■ (M-2). 

• The initial ensemble members are constructed as follows: 

-I Nstato 



B _ ^tmc + / V- I trucl ^ j^NetatoXl ^ for 1 < i < N 

* Nstate ^ ' 



ens ) 



k=l 



where is drawn from a Normal distribution with zero mean and covariance 
matrix 



Q = diag{STDL} eR 



Nstate X Nstate 
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For testing purposes, three values are assumed for the standard deviation of 
model errors (STDens): 2.5, 5.0 and 7.5. Notice the large dispersion of the initial 
ensemble members, which can make difficult the convergence of any filter since 
the huge spread out of the initial ensemble members with respect to the typical 
value C. 

• The number of observation per simulation, for each size (Nstatc) of the model 
state , is defined as follows: 

Nobs = Pobs ■ Nstate; 

where Pobs is the percentage of components observed from the model state. 
The values given to Pobs are 50%, 70% and 90%. Those, measurements are 
taken every 10 time units and they are constructed as shown in equation (37). 
Notice, there are 120 analysis steps out of 1200 time steps (10% of the total 
simulation time). 

• For the time evolution of the model, zero boundary conditions are assumed 
and the boundaries are not included onto the ensemble representation. Due to 
this, the dimension of the vector state Ngtatc = (N — 2) ■ (M — 2). 

• For each instance we consider simulations with Nens ^ {20, 60, 100} ensemble 
members. The number of ensemble members is one to two orders of magnitude 
smaller than the total number of observations. 

The RSME values for analysis errors for the QG33, QG65 and QG129 instances 
are shown in Tables 6, 8 and 10, respectively. The results depend on the number 
of ensemble members (Nens), the number of observations (Nobs), and the deviation of 
the initial ensemble mean (STDgns)- The RSME is quantifiess errors in the stream 
function ip, whose values are computed through the relation (41). In terms of accu- 
racy there is no significant difference between different EnKF implementations. As 
expected, when the error in the initial ensemble is increased, the accuracy in the 
analysis decreases. The error does not show an exponential growth, even when the 
number of components in the model state (Ngtate) is much larger than the number 
of ensemble members (e.g., for the QG129 instance). When the number of ensemble 
members is increased, the analysis error is decreased. This is illustrated by the snap- 
shots of the QG33 simulation over 1200 time steps presented in Figure 5. There, we 
can clearly see that the ensemble of size 100 provides a better estimation (x^) to the 
true state of the model (x'''"°) than the ensembles of sizes 20 and 60. Additionally, 
the number of observations plays an important role in the estimation of the true 
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model state when the size of the vector state is much larger than the number of 
ensemble members. 

The ET values for the QG33, QG65 and QG129 simulations are shown in Tables 
6, 8 and 10, respectively. The time is expressed in seconds (s) if it is below 30 
minutes, and otherwise is expressed in minutes (min) and hours (h). The Cholesky 
implementation shows good performance when the number of observations is small. 
From Table 7 (the blocks where the number of observations are 480, 672 and 864) we 
see that the Cholesky implementation performance is more sensitive to the number of 
observations than to the number of ensemble members. This EnKF implementation 
is not suitable for a large number of observations. For instance, the Cholesky elapsed 
time for the QG129 instance is not presented since each simulation takes more than 
4 days in order to be completed. 

The SVD implementation shows a better relative performance than for the Lorenz 
96 test, since the number of ensemble members is small with respect to the number 
of observations. For example, for the QG33 instance, the SVD implementation shows 
a better performance than Cholesky when the number of observations and ensemble 
members are small. In addition, when the size of vector state is increased, the SVD 
implementation shows a better performance than Cholesky. This agrees with the 
computational complexity upper bounds presented in Section 1. As is expected, the 
performance of the SVD based methods is better than the Cholesky implementations 
when the number of observations is much larger than the number of the ensemble 
members. 

The Sherman-Morrison implementation shows the best performance among the 
compared methods. This is true even when the number of observations is much 
larger than the number of ensemble members, as seen in Table 11. The results of 
both test cases (the quasigeostrophic and Lorenz models) lead to the conclusion 
that the performance of the iterative-Sherman implementation is not sensitive to the 
increase in the number of observations, making it attractive for implementation with 
large-scale observational systems. 
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N 

■L ^ ens 


Nobs 


STDens 


EnKFsher 


EnKFchoi 


EnKFsvD 


20 


480 


5 

10 
15 


1.71348538 x IQ-'* 
3.42503478 x 10"'* 
5.13685273 x IQ-* 


1.71348538 x 10"'* 
3.42503478 x lO"'^ 
5.13685273 x 10"'' 


1.71348538 x lO"* 
3.42503478 x 10"^ 
5.13685273 x 10-* 


672 


5 

10 
15 


1.68916683 x IQ-* 
3.37557073 x IQ-* 
5.06204927 x 10"'' 


1.68916683 x 10"^ 
3.37557073 x lO^'^ 
5.06204927 x 10"'^ 


1.68916683 x 10"* 
3.37557073 x lO"* 
5.06204927 x 10"* 


864 


5 

10 
15 


1.68758284 x 10"* 
3.37604000 x 10"^ 
5.06437992 x 10"'' 


1.68758284 x lO"'* 
3.37604000 x 10"'^ 
5.06437992 x 10"'^ 


1.68758284 X 10"" 
3.37604000 x 10"" 
5.06437992 x lO""* 


60 


480 


5 

10 
15 


1.71410180 X 10-* 
3.43004244 x lO^'* 
5.14603943 x 10"'' 


1.71410180 X lO"'* 
3.43004244 x 10"'^ 
5.14603943 x 10"'^ 


1.71410180 X 10"" 
3.43004244 x lO"* 
5.14603943 x 10"* 


672 


5 

10 
15 


1.64430970 x 10"'' 
3.29692664 x lO""* 
4.94948120 x 10"" 


1.64430970 x lO"'' 
3.29692664 x lO"'^ 
4.94948120 x 10"^ 


1.64430970 x 10~* 
3.29692664 x 10-* 
4.94948120 x 10"'' 


864 


5 

10 
15 


1.64737541 x 10"* 
3.29861914 X lO""* 
4.94925540 x lO--* 


1.64737541 x lO"'' 
3.29861914 x 10"'^ 
4.94925540 x 10"'^ 


1.64737541 x 10"* 
3.29861914 X lO"* 
4.94925540 x 10"* 


100 


480 


5 

10 
15 


1.62358824 x 10-* 
3.23304631 x 10""' 
4.84328422 x 10"^ 


1.62358824 x lO"'* 
3.23304631 x lO""^ 
4.84328422 x lO"'' 


1.62358824 X 10"* 
3.23304631 x lO"* 
4.84328422 x lO"'* 


672 


5 

10 
15 


1.54859388 x 10"* 
3.10925745 x 10"'' 
4.66772642 x 10"" 


1.54859388 x lO"'' 
3.10925745 x lO^'^ 
4.66772642 x lO""^ 


1.54859388 x 10"* 
3.10925745 x 10""* 
4.66772642 x 10"^ 


864 


5 

10 
15 


1.44950729 x 10"* 
2.90313729 x lO""* 
4.35737112 x 10"'* 


1.44950729 x lO^'^ 
2.90313729 x lO^'^ 
4.35737112 x lO"'' 


1.44950729 x 10-* 
2.90313729 x 10-* 
4.35737112 x 10"^ 



Table 6: Analysis RMSE for different EnKF implementations applied to the QG33 
instance. All methods give similar results. When the number of ensemble and/or 
observations is increased, the analysis accuracy is improved. 
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X^-« xf,Nens = 20 Xf, Nens=60 xf , N^n, = 100 





Figure 5: Snapshots of the QG33 simulation for Nens = 20,60 and 100 members, at 
the time steps t = 0,239,478,717,956 and 1195 (out of 1200). As expected, when 
the number of ensemble members is increased the estimation of the true state (x*''"°) 
is improved (the RMSE is decreased). 
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N 

^ ens 


Nobs 


STDens 


EnKFsher 


EnKFchoi 


EnKFsvD 






2.5 


17.6 s 


33.4 s 


24.7 s 




480 


5.0 


17.2 s 


32.8 s 


25.3 s 






7.5 


17.3 s 


33.1 s 


28.6 s 






2.5 


17.9 s 


61.9 s 


39.4 s 


20 


672 


5.0 


17.7 s 


62.9 s 


49.3 s 






7.5 


17.8 s 


62.1 s 


37.6 s 






2.5 


17.8 s 


113.7 s 


57.9 s 




864 


5.0 


18.2 s 


116.3 s 


79.7 s 






7.5 


18.1 s 


118.4 s 


61.2 s 






2.5 


42.9 s 


57.8 s 


63.7 s 




480 


5.0 


42.9 s 


57.5 s 


62.9 s 






7.5 


42.8 s 


57.8 s 


58.3 s 






2.5 


44.3 s 


90.3 s 


142.3 s 


60 


672 


5.0 


44.5 s 


89.8 s 


92.3 s 






7.5 


44.5 s 


90.3 s 


91.3 s 






2.5 


46.8 s 


150.6 s 


187.3 s 




864 


5.0 


46.6 s 


156.7 s 


144.8 s 






7.5 


46.5 s 


154.3 s 


200.9 s 






2.5 


72.3 s 


83.2 s 


102.0 s 




480 


5.0 


72.4 s 


83.2 s 


96.8 s 






7.5 


72.6 s 


83.4 s 


94.4 s 






2.5 


77.8 s 


118.9 s 


140.2 s 


100 


672 


5.0 


77.8 s 


119.5 s 


166.8 s 






7.5 


77.1 s 


120.1 s 


216.4 s 






2.5 


82.8 s 


209.1 s 


412.9 s 




864 


5.0 


83.2 s 


212.7 s 


289.7 s 






7.5 


82.7 s 


202.6 s 


304.6 s 



Table 7: Computational times for several EnKF implementations applied to the 
QG33 instance. Different numbers of ensemble members and numbers of observations 
are considered. 
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N 

■L ^ ens 


Nobs 


STDens 


EnKFsher 


EnKFchoi 


EnKFsvD 


20 


1984 


2.5 
5.0 
7.5 


2.38730154 x 10"* 
4.77366714 x lO^'' 
7.16006328 x lO"* 


2.38730154 x 10"'^ 
4.77366714 x 10^'^ 
7.16006328 x 10"^ 


2.38730154 x lO"* 
4.77366714 x 10"^ 
7.16006328 x lO^* 


2778 


2.5 
5.0 
7.5 


2.38650829 x lO"* 
4.77195277 x 10"'' 
7.15744082 x 10"'' 


2.38650829 x lO"'' 
4.77195277 x 10""^ 
7.15744082 x lO"'' 


2.38650829 x 10"* 
4.77195277 X 10"^ 
7.15744082 x 10"* 


3572 


2.5 
5.0 
7.5 


2.38731447 x 10"* 
4.77279065 x 10""' 
7.15829932 x 10"^ 


2.38731447 x lO"'' 
4.77279065 x lO"'' 
7.15829932 x 10^'* 


2.38731447 x 10"" 
4.77279065 x 10"" 
7.15829932 x lO"" 


60 


1984 


2.5 
5.0 
7.5 


2.34788461 x 10""* 
4.69640125 x lO""* 
7.04505865 x 10"'' 


2.34788461 x lO"'' 
4.69640125 x lO"'^ 
7.04505865 x lO"'^ 


2.34788461 x 10"" 
4.69640125 x lO"* 
7.04505865 x 10"" 


2778 


2.5 
5.0 
7.5 


2.34427724 x 10"* 
4.68838293 x 10"'' 
7.03254635 x 10"^ 


2.34427724 x lO"'' 
4.68838293 x lO"'^ 
7.03254635 x lO"'^ 


2.34427724 x 10"* 
4.68838293 x lO^"* 
7.03254635 x 10"'' 


3572 


2.5 
5.0 
7.5 


2.34303497 x 10"" 
4.68673565 x 10""' 
7.03046901 X 10"'' 


2.34303497 x lO"'' 
4.68673565 x 10"'^ 
7.03046901 X 10"'^ 


2.34303497 x 10"* 
4.68673565 x 10^'' 
7.03046901 X 10"* 


100 


1984 


2.5 
5.0 
7.5 


2.37123911 X 10-"* 
4.74051478 x 10"'' 
7.10951271 X 10"'' 


2.37123911 X lO"'" 
4.74051478 x 10"'^ 
7.10951271 X lO"'' 


2.37123911 X 10-* 
4.74051478 x lO"* 
7.10951271 x lO"'* 


2778 


2.5 
5.0 
7.5 


2.34083117 X 10"* 
4.68257831 x 10"'' 
7.02424460 x 10"'' 


2.34083117 X lO"'' 
4.68257831 x lO^'^ 
7.02424460 x lO""^ 


2.34083117 X 10"" 
4.68257831 x 10"* 
7.02424460 x 10"^ 


3572 


2.5 
5.0 
7.5 


2.32378928 x 10"* 
4.64697670 x lO""* 
6.97023351 x 10"'' 


2.32378928 x lO^'^ 
4.64697670 x lO^'^ 
6.97023351 x lO"'' 


2.32378928 x lO"* 
4.64697670 x lO^* 
6.97023351 x 10"^ 



Table 8: Analysis RMSE for different EnKF implementations applied to the QG65 
instance. All methods give similar results. 
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N 

^ ens 


Nobs 


STDens 


EnKFsher 


EnKFchoi 


EnKFsvD 






2.5 


71.5 s 


44.9 min 


454.7 s 




1984 


5.0 


70.6 s 


33.4 min 


429.3 s 






7.5 


70.3 s 


45.4 min 


360.6 s 






2.5 


71.0 s 


1.8 h 


831.1 s 


20 


2778 


5.0 


73.2 s 


1.3 h 


735.7 s 






7.5 


71.9 s 


1.4 h 


795.8 s 






2.5 


72.2 s 


3.8 h 


1602.5 s 




3572 


5.0 


74.3 s 


3.3 h 


1112.3 s 






7.5 


72.2 s 


3.0 h 


771.5 s 






2.5 


179.0 s 


45.0 min 


1215.7 s 




1984 


5.0 


179.5 s 


55.3 min 


1235.6 s 






7.5 


178.4 s 


51.9 min 


1066.7 s 






2.5 


190.6 s 


2.4 h 


1463.3 s 


60 


2778 


5.0 


188.3 s 


1.9 h 


39.4 min 






7.5 


189.7 s 


2.5 h 


32.4 min 






2.5 


202.6 s 


4.1 h 


1.2 h 




3572 


5.0 


198.9 s 


2.9 h 


1.1 h 






7.5 


201.9 s 


4.3 h 


1.0 h 






2.5 


313.8 s 


52.4 min 


37.2 min 




1984 


5.0 


314.3 s 


40.1 min 


33.2 min 






7.5 


309.6 s 


58.4 min 


37.9 min 






2.5 


346.9 s 


1.7 h 


1.1 h 


100 


2778 


5.0 


342.9 s 


1.7 h 


1.0 h 






7.5 


340.7 s 


2.9 h 


1.1 h 






2.5 


373.9 s 


4.8 h 


1.6 h 




3572 


5.0 


378.3 s 


4.0 h 


1.8 h 






7.5 


383.7 s 


5.1 h 


1.3 h 



Table 9: Computational times for several EnKF implementations applied to the 
QG65 instance. Different numbers of ensemble members and numbers of observations 
are considered. 
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N 

■L ^ ens 


Nobs 


STDens 


EnKFsher 


EnKFsvD 






2.5 


9.92105438 x 10" 


b 


9.92105438 x 10' 


b 




8064 


5.0 


1.98414661 X 10" 


4 


1.98414661 X 10" 


4 






7.5 


2.97618537 x 10" 


4 


2.97618537 x 10' 


4 






2.5 


9.90045121 X 10" 


b 


9.90045121 X 10' 


b 


20 


11290 


5.0 


1.97996616 x 10" 


4 


1.97996616 x 10' 


4 






7.5 


2.96988644 x 10" 


4 


2.96988644 x 10' 


4 






2.5 


9.87990386 x 10" 


b 


9.87990386 x 10" 


b 




14516 


5.0 


1.97591049 X 10" 


4 


1.97591049 x 10" 


4 






7.5 


2.96383002 x 10" 


4 


2.96383002 x 10' 


4 






2.5 


9.74245572 x 10" 


b 


9.74245572 x 10' 


b 




8064 


5.0 


1.94820830 x 10" 


4 


1.94820830 x 10' 


4 






7.5 


2.92217500 x 10" 


4 


2.92217500 x 10' 


4 






2.5 


9.63593685 x 10" 


b 


9.63593685 x 10' 


b 


60 


11290 


5.0 


1.92682256 x 10" 


4 


1.92682256 x 10' 


4 






7.5 


2.89005780 x 10" 


4 


2.89005780 x 10' 


4 






2.5 


9.67669396 x 10" 


b 


9.67669396 x 10" 


D 




14516 


5.0 


1.93545013 X 10" 


4 


1.93545013 X 10" 


4 






7.5 


2.90322987 x 10" 


4 


2.90322987 x 10' 


4 






2.5 


9.56333807 x 10" 


b 


9.56333807 x 10' 


b 




8064 


5.0 


1.91331921 X 10" 


4 


1.91331921 X 10' 


4 






7.5 


2.87027307 x 10" 


4 


2.87027307 x 10' 


4 






2.5 


9.49419202 x 10" 


b 


9.49419202 x 10' 


b 


100 


11290 


5.0 


1.89929524 x 10' 


4 


1.89929524 x 10' 


4 






7.5 


2.84918006 x 10" 


4 


2.84918006 x 10' 


4 






2.5 


9.47165868 x 10" 


b 


9.47165868 x 10' 


b 




14516 


5.0 


1.89427095 x 10" 


4 


1.89427095 x 10' 


4 






7.5 


2.84137686 x 10" 


4 


2.84137686 x 10' 


4 



Table 10: Analysis RMSE for different EnKF implementations applied to the QG129 
instance. All methods give similar results. 
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■L ^ ens 


Nobs 


STDens 


EnKFsher 


EnKFsvD 






5 


289.9 s 


1.8 h 




8064 


5.0 


293.5 s 


1.4 h 






7.5 


286.8 s 


1.9 h 






5 


303.2 s 


2.2 h 


20 


11290 


5.0 


302.2 s 


2.0 h 






7.5 


303.5 s 


2.9 h 






5 


315.7 s 


4.5 h 




14516 


5.0 


308.1 s 


3.8 h 






7.5 


309.1 s 


3.7 h 






5 


764.8 s 


4.1 h 




8064 


5.0 


795.9 s 


3.2 h 






7.5 


764.7 s 


3.9 h 






5 


838.0 s 


6.3 h 


60 


11290 


5.0 


832.6 s 


5.4 h 






7.5 


817.7 s 


6.7 h 






5 


910.7 s 


9.9 h 




14516 


5.0 


899.9 s 


9.3 h 






7.5 


864.8 s 


11.2 h 






5 


1381.5 s 


5.3 h 




8064 


5.0 


1360.3 s 


5.2 h 






7.5 


1397.9 s 


5.9 h 






5 


1492.4 s 


10.3 h 


100 


11290 


5.0 


1494.5 s 


8.9 h 






7.5 


1506.6 s 


6.7 h 






5 


1624.8 s 


13.7 h 




14516 


5.0 


1634.9 s 


12.9 h 






7.5 


1664.2 s 


14.9 h 



Table 11: Computational times for several EnKF implementations applied to the 
QG129 instance. Different numbers of ensemble members and numbers of observa- 
tions are considered. 

5 Conclusions and Future Work 

We propose a novel implementation of the EnKF based on an iterative application of 
the Sherman-Morrison formula. The algorithm exploits the special structure of the 
background error covariance matrix projected onto the observation space. The com- 
putational complexity of the new approach is equivalent to that of the best EnKF 
implementations available in the literature. Nevertheless, the performance (elapsed 
time) of most of the existing methods is strongly dependent from the condition 
Nobs S> Nens (the number of observations is large compared to the ensemble size); the 
performance of the new approach is not affected by this condition. In addition, the 
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term N^j^ is not presented in the upper-bound of the effort of the proposed method, 
which leads to better performance when the number of observations and of ensem- 
ble members are of similar magnitude (Nobs ~ Ncns)- To assess the accuracy and 
performance of the proposed implementation two standard test problems have been 
employed, namely, the Lorenz 96 and the quasi-geostrophic models. All EnKF im- 
plementations tested (Cholesky, SVD, Sherman-Morrison) provide virtually identical 
analyses. However, the proposed Sherman-Morrison approach is much faster than 
the others even when the number of observations is large with respect to the number 
of ensemble members (Nobs 2> Ngns)- The parallel version of the new algorithm has a 
theoretical complexity that grows only linearly with the number of observations, and 
is therefore well suited for implementation in large scale data assimilation systems. 
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